Methodological Challenges in Estimating Tone: Application to News Coverage of the U.S. Economy

نویسندگان

  • Pablo Barberá
  • Amber Boydstun
  • Suzanna Linn
  • Ryan McMahon
  • Jonathan Nagler
چکیده

Machine learning methods have made possible the classification of large corpora of text by measures such as topic, tone, and ideology. However, even when using dictionary-based methods that require few inputs by the analyst beyond the text itself, many decisions must be made before a measure of any kind is produced from the text. When coding media the analyst must decide on the universe of media sources to sample from, as well as the criteria for selecting articles for coding from within that universe. If utilizing a supervised learning method, the method of generating training data presents many decisions: the unit of analysis to code, choice of coders, number of articles or units to code, number of coders per unit, and method of dealing with multiple codings of a single object. In this paper we consider the many decisions made by the analyst in using machine learning to classify media texts—using as a running example efforts to measure the tone (positive, negative, neutral) of newspaper coverage of the economy—and highlight our key findings throughout. In particular, we show that the decision of how to choose the corpus matters a great deal. We also introduce coder variance as a simple but novel measure of coder quality, and we demonstrate that this concept can be used to illustrate the varying returns to using multiple coders versus larger sample sizes in construction of a training dataset optimized for best classifier production. Finally, we introduce Classifer Training Using Multiple Codings, an improved method of utilizing multiple codings of individual objects, and demonstrate through simulation that it outperforms alternatives.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of Satellite Data and Data Mining Algorithms in Estimating Coverage Percent (Case study: Nadoushan Rangelands, Ardakan Plain, Yazd, Iran)

Assessing and monitoring rangelands in arid regions are important and essential tasks in order to manage the desired regions. Nowadays, satellite images are used as an approximately economical and fast way to study the vegetation in a variety of scales. This research aims to estimate the coverage percent using the digital data given by ETM+ Landsat satellite. In late May and early Ju...

متن کامل

Methodological bases of estimating the efficiency of organizational and economic mechanism of regulatory policy in agriculture

Ukrainian agriculture creates 12-14% of GDP. Ensuring the conditions for sustainable economic development implies the use of adequate mechanisms for regulating economic processes by the government. In the process of formation and implementation of the organizational and economic mechanism of regulatory policy, a system of indicators plays an important role in assessing the impact of such policy...

متن کامل

The Power of Propaganda: The Effect of U.S. Government Bias on Cold War News Coverage of Human Rights Abuses

This paper investigates the extent to which strategic objectives of the U.S. government influenced news coverage during the latter part of the Cold War (1976-88). We establish two reduced form relationships: 1) strategic objectives of the U.S. government causes the State Department to under-report human rights violations of political allies; and 2) these objectives reduce news coverage of human...

متن کامل

The Effects of China's Growth in Manufacturing Sector in the U.S. Economy

T his paper investigates the gain of bilateral trade between China and U.S. in manufacturing sectors when both countries play a role in asymmetric (biased) growth of  international trade. Our model includes a special case of Biased Growth Theory in international trade. We collected labor productivity, export and import data by using classification of manufacturing industries, for U.S...

متن کامل

“Enemies of the People?” Public Health in the Era of Populist Politics; Comment on “The Rise of Post-truth Populism in Pluralist Liberal Democracies: Challenges for Health Policy”

In this commentary, we review the growth of populist politics, associated with exploitation of what has been termed fake news. We explore how certain words have been used in similar contexts historically, in particular the term “enemy of the people,” especially with regard to public health. We then set out 6 principles for public health professionals faced with these situations. First, using th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016